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ing a model. However, in the case of rare medical diseases, images from affected 
patients are much harder to come by compared to images from non-affected 
patients, resulting in unwanted class imbalance. Various processes of tackling 
class imbalance issues have been explored so far, each having its fair share of 
Keywords: drawbacks. In this research, we propose an outlier detection based image classi- 
fication technique which can handle even the most extreme case of class imbal- 
ance. We have utilized a dataset of malaria parasitized and uninfected cells. An 
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Anomaly detection 


Autoencoder autoencoder model titled AnoMalNet is trained with only the uninfected cell im- 
Class imbalance ages at the beginning and then used to classify both the affected and non-affected 
Classification cell images by thresholding a loss value. We have achieved an accuracy, preci- 
Malaria cell image sion, recall, and F1 score of 98.49%, 97.07%, 100%, and 98.52% respectively, 
performing better than large deep learning models and other published works. 
As our proposed approach can provide competitive results without needing the 
disease-positive samples during training, it should prove to be useful in binary 
disease classification on imbalanced datasets. 
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1. INTRODUCTION 

Malaria is a menacing disease that has affected large hordes of people in the past and continues to do 
the same at present. The statistics speak for themselves, 2020 saw a record-breaking estimation of 241 million 
malaria cases worldwide [I]. Needless to say, it is imperative to work on the remedies for such a deadly disease 
in order to mitigate the damages inflicted by them. 

Microscopic thick and thin blood smear examinations are the most reliable and routinely used method 
for disease diagnosis. Thin blood smears help identify the species of the parasite causing the infection, whereas 
thick blood smears help detect the presence of parasites. However, the efficiency of this manual analysis method 
heavily depends on the medical personnel carrying out the tasks, also each diagnosis takes a huge deal of time. 
In the era of automation, where researchers are continuously working on fast and efficient ways of treating 
malaria, deep learning has been quite popular in terms of the detection and analysis of malaria as discussed 
in the literature review later on. Along with malaria classification, deep learning has held its grip in several 
research fields such as medical image analysis, natural language processing, and audio processing [2]-[10] 
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Though deep learning techniques have proved to be handy in the recent past, it is observed that most 
of these models are heavy and use up a lot of computational power, this is also an issue when we try to deploy 
these models to mobile or edge devices. Additionally, these methods do not exhibit much potential when it 
comes to solving class imbalance issues. To rectify such issues this research work introduces AnoMalNet, an 
autoencoder-based method for the investigation of malaria in cell tissues that are lighter and capable to solve 
class imbalance problems in datasets. In this research the following contributions are made: 

- An anomaly detection-based approach which is built upon autoencoders for the investigation of malaria in 
cell tissues that deals with class imbalance issues has been introduced. 


- The proposed model outperforms state-of-the-art models like VGG16, Resnet50, MobileNetV2, and LeNet. 
- Comparative analysis with other published methods has been provided. 


2. LITERATURE REVIEW 

There have been quite a few research in the field of malaria cell image classification. In the initial 
section of the review, we are going to discuss some of those. Later on we will have a short discussion on 
various techniques that have been used for handling class imbalance issue. 

Raihan and Nahid used a bioorthogonal wavelet to reduce the image size to 72x72 resolution 
and extract the features . Images were passed through a custom convolutional neural network (CNN) with three 
convolutional layers and three fully connected (FC) layers. This CNN was used to extract the features from 
the first FC layer. In this way, 768 features were found initially. The whale optimization algorithm (WOA) 
was used to select the optimal subset of features. Samples of these features were passed through the XGBoost 
algorithm. For set 1 with 768 features, XGBoost achieved 94.92%, 94.34%, 95.57%, and 94.95% and for set 
2 with 365 features, the model achieved 94.78%, 94.39%, 95.21%, and 94.80% for accuracy, precision, recall, 
and F1 score respectively in the validation set. XGBoost model construction time was half for the second 
set due to a number of features being reduced. Shapley additive explanations (SHAP) was used as a model 
explainability tool to assess the importance of the features [12]. 

In the research article of Narayannan et al. [I3], reshaped the images into 50x50 resolution and the 
color consistency technique was applied to maintain the same illumination condition for all the images. A fast 
CNN model with 6 convolution layers and 2 FC layers was deployed. Additionally, AlexNet, ResNet, VGG-16, 
and DenseNet with transfer learning from imagenet were deployed. Furthermore, the bag-of-features model 
using SVM was used. Among all these implemented models, DenseNet got the highest accuracy of 96.6%. 
Meanwhile, Reddy and Juliet used a pre-trained ResNet with a sigmoid-enabled FC layer as the last layer. 
Apart from the last few layers, all the other layers were frozen during training. They achieved accuracies of 
95.91% and 95.4% for training and validation respectively. The authors reported the existence of a test set in 
the experiment but no test result was reflected on it. 

Rajaraman et al. |15] introduced a customized model with three convolution layers and two FC 
layers in their research work. AlexNet, Xception, ResNet, and DenseNet121 were used to extract features 
whereas gridsearch was used for hyperparameter optimization. For each individual CNNs, their default input 
resolution was used and for the pre-defined architectures, they tried extracting features from different layers 
and determined the most optimal layer to extract features from in order to improve accuracy. With extracted 
features from the most optimal layer, they got the highest accuracy of 95.9% from VGG16 and from ResNet50 
among all the tested models. In another research, thick blood smear images were collected [16]. Among those, 
7,245 bounding box instances of plasmodium were annotated in 1,182 images. Images were divided into small 
patches and passed through a CNN to learn whether the patch contains any object of interest or not. The CNN 
was run on a 50/50 train-test split with which, they achieved an area under the receiver operating characteristic 
curve(ROC AUC) of 1.00. Authors claimed that their process is quite efficient since it can directly learn from 
pixel data. Furthermore, Bibin et al. introduced a trained model based on a deep belief network (DBN) to 
classify 4,100 peripheral blood smear images into two classes. By stacking limited Boltzmann machines and 
utilizing the contrastive divergence approach, the DBN is pre-trained. They took features from the photos and 
initialized the DBN’s visible variables in order to train it. This paper’s feature vector combines the attributes 
of color and texture. With an F-score of 89.66%, a sensitivity of 97.60%, and a specificity of 95.92%, the 
proposed method has surpassed existing state-of-the-art methods significantly. 

Lipsa and Dash used the optimum number and size of convolution layers and spooling layers 
coupled with CNN. An Adam optimizer was employed to train and validate the model where a case study of the 
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malaria diagnosis dataset was observed. Images were fed into the CNN keeping their size or color unchanged 
and assessments of their performance were made. An architectural comparison was performed between the 
proposed CNN model and some popular CNN architectures with the proposed model having a smaller number 
of hyperparameters. This comparison demonstrates that the mechanism of this model demands much fewer 
evaluation parameters, making the suggested approach a time-effective and computationally precise model in 
terms of predicate accuracy. Nugroho and Nurfauzi used green, green, blue (GGB) color normalization 
as a preprocessing step in the detection of malaria. The findings demonstrate that their method has greater 
sensitivity and consistently comparable precision in a number of intersections over union (IoU) thresholds 
for malaria identification. Finally, Tan et al. employed an automated segmentation of one of the types, 
plasmodium falciparum out of 5 common types of malaria on a thin blood smear. It was experimented with 
using their proposed residual attention U-net. When the trained system was applied to verified test data, the 
results indicated an accuracy of 0.9687 and a precision of 0.9691. 

Of all the research work that has been discussed so far, pretty much all of them used regular supervised 
learning-based methods on a balanced dataset. As discussed earlier, class imbalance in medical image datasets 
is not a rare incident and various approaches are taken to handle class imbalance. One prominent way is to 
generate synthetic data for the minority distribution class through a generative adversarial network (GAN) and 
balance out the class distribution. However, Mariani et al. mentioned, GAN itself takes lots of images to 
generate synthetic data. Therefore, when the synthetic data generation is for a class that is sparse in distribution, 
it is not realistic to generate good-quality synthetic data since it is not possible to provide GAN with a sufficient 
amount of training images in the first place. Therefore, although GAN can hypothetically solve class imbalance 
issues, it is really hard to train a GAN in practice. Additionally, another common way of handling class 
imbalance is data augmentation. Well-known augmentation techniques include geometric transformation, noise 
injection, color space transformation, image mixing, applying kernel filter, cropping, random erasing, and so 
on [22]. Some of these techniques, for example, image mixing and applying kernel filter may completely distort 
an image and change the underlying feature space. This feature transformation is generally unwanted in the case 
of medical images, as images of different modalities come with a very specific set of features. Additionally, 
other augmentation tactics, such as geometric transformation, cropping, and noise injecting. are quite limited 
in terms of creating variation. As a result, due to the limitations of the currently available approaches, the 
proposed AnoMalNet architecture in this paper can be useful. 


3. METHOD 

Classifying malaria cell images into either parasite infected cells or uninfected cell is a binary classi- 
fication task. Traditional deep neural network (DNN) models can be used for solving this problem. In addition 
to this, it can be easily formulated as an outlier detection problem with the help of auto-encoders. In the fol- 
lowing subsections, several DNN models like LeNet, VGG16, ResNet50, MobileNetv2, and autoencoders are 
described. 


3.1. Deep neural network models 

One of the earliest DNN models that have been proposed is LeNet [23]. It consists of three convo- 
lutional layers of kernel size 5 and two average pooling layers. Additionally, it also has two fully-connected 
layers which act as the classifiers. Compared to LeNet, VGG16 is a much deeper model comprising 13 con- 
volutional layers and 3 fully-connected layers [24]. This model uses a smaller kernel size. In this approach, 
the kernel size is set to 3. Theoretically, a deeper neural network model should perform better than a shallow 
one. However, it was found that if a deep neural network is used then a problem arises which is the vanishing 
gradient problem. In order to solve this problem, the ResNet model was proposed which contains residual 
blocks [25]. These blocks contain skip connections that take the output from previous layers and feed it to the 
later ones. Thus it helps in solving the vanishing gradient problem. MobileNetV2 is a CNN architecture that is 
specifically designed for mobile devices. It is built on an inverted residual structure where the bottleneck layers 
are coupled by residual connections [26]. Lightweight depthwise convolutions are used in the intermediate 
expansion layer as a source of non-linearity to filter features. 

An autoencoder is a DNN model which tries to learn its input to the output. It is a two part DNN model 
where the first part is an encoder network while the later one is a decoder network. The task of the encoder 
part is to encode the input to a representation of the input of a smaller dimension while the decoder tries to 
collect this representation and reconstruct it to the original input. With the help of this encoder and decoder 
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network, an autoencoder can be used to create a spare representation of the input data. For this research work 
a custom convolutional autoencoder is used. The major difference between this AnoMalNet architecture and 
autoencoders is that, in convolutional autoencoders, convolutional layers are used while in the later case, regular 
feed forward neural networks are used. In order to create the encoder network three convolution layers were 
used of 4, 16, and 32 channels respectively. The kernel size for all of these convolution layers were set to 3x3 
and the padding was set to 1. AIl of these convolution layers were followed by relu activation function and max 
pooling layers of 2x2. 

The decoder network was comprised of three transpose convolution layers of 32, 16, and 4 channels 
respectively. In this case, 2x2 was set as the kernel size and the stride value was set at 2. Apart from the last 
transpose convolution layer, all the other layers’ outputs were passed through a relu function while the output 
of the final layer went through a sigmoid activation function. Figure[I]provides a graphical view of our custom 
autoencoder. 


Encoder Decoder 


Transpose 


Conv2d 


Input Reconstructed 
Image Sparse Image 
Representation 


Figure 1. Custom autoencoder 


3.2. Proposed approach 

At the very beginning, some simple data preprocessing techniques are applied on the image dataset. 
All the images are reshaped to a dimension of 32x32. Additionally, all the images are converted from the red 
green blue (RGB) to gray-scale color space. Changing the color space domain doesn’t create many problems for 
this task because the parasite is still visible in the gray-scale color space and by doing so it removes unnecessary 
noises in the images. 

After this step, a custom auto-encoder with the decoder network is trained using only uninfected cell 
images. Mean squared error (MSE) is used as a loss function to train the weights of the model. The proposed 
approach is based on the intuition that, during testing, this trained auto-encoder will achieve a loss score for 
uninfected cell images. However, in the case of parasite infected cells, the model will output a significantly 
higher loss value. This can be visualized with the help of Figure[2| The training and testing of the model in the 
case of uninfected cell images are shown while in Figures Bla and (b), the performance of the model during 
inference on infected cell images is shown. With the help of simple statistics, a cut-off point can be established 
to label unknown cell images as infected ones or uninfected ones. An unknown image is determined as an 
outlier or infected cell if the loss value of the unknown image which we get after passing through the model 
is more than the mean plus three times of the standard deviation of train loss. This is a standard statistical 
approach to determine whether a particular data is an outlier or not. 


Training Testing 
Input Sparse Reconstructed Input Sparse Reconstructed 
Image Representation Image Image Representation Image 
(a) (b) 


Figure 2. Proposed methodology (a) uninfected cell images for training and (b) infected cell images used for 
testing only 
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Reconstruction of different types of cell images are shown in Figure B] Original and reconstructed 
images of uninfected cells are shown in Figures BI» and (b). Additionally, Figures Bic) and (d) contain the 
original and reconstructed images of infected cells. From these figures, it can be seen that in the case of 
uninfected normal cell images, the reconstructed image is quite similar to the original one. However, in case 
of parasite infected cells, the reconstruction is not so good. The model is able to get the shape correct to some 
extent but not the parasite inside the cell. 


Uninfected Uninfected Uninfected Uninfected 


00 05 10 00 05 10 


(b) 
Infected Infected Infected Infected 
0.0 05 10 00 05 10 0.0 05 10 00 0.5 10 
(c) (d) 


Figure 3. Visualization of (a) original uninfected cell, (b) reconstructed uninfected cell, (c) original infected 
cell, and (d) reconstructed infected cell 


4. EXPERIMENTAL RESULTS AND ANALYSIS 

The models and proposed approach that were described in the previous section were used for exper- 
imentation on a dataset which can be used for malaria parasite classification. Details about the dataset can be 
found in the following subsection. Apart from this, a detailed discussion on the experimental setup, results and 
analysis are provided in the following subsections. 


4.1. Dataset description 

The dataset that has been used here is collected from the National Institute of Health (NIH). It contains 
a total of 27,558 images. Among these, 13,779 images are infected with malaria parasites while the rest of them 
are uninfected cell images. All the images are in RGB color-space. 


4.2. Experimental setup 

In order to train the AnoMalNet model, randomly selected 1,607 uninfected cell images were used. 
For validating the models performance, 407 uninfected cell images were used. A total of 4,009 images were 
used to train and evaluate the performance of the AnoMalNet model. The MSE loss function was used along 
with the Adam optimizer and the learning rate was set to 0.01. The model was trained for a total of 200 epochs. 
During the testing phase, a total of 5,512 images were used. Among them, 2,757 were parasite infected images 
and 2,755 were uninfected cells. After training the model, the loss value of the validation set was used to 
calculate the mean and standard deviation which was later used for creating a threshold for decision making. 
As mentioned in section 3.2, a particular image is labeled as a parasitized cell image if it’s loss value is greater 
than the mean plus three times of the standard deviation. Using this threshold, all the images are classified with 
the help of an autoencoder. 

Several DNN networks namely LeNet, VGG16, ResNet50, and MobileNetV2 were trained to compare 
the performance of the proposed approach [23]-26]. All of these models were trained on 22,046 images from 
the dataset which contain both infected and uninfected cell images. These DNN models were trained for 100 
epochs each using Adam optimizer while keeping the learning rate to 0.01, having cross entropy as the loss 
function. 
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4.3. Results and discussion 


For all the trained models, loss vs epoch curve can be found in Figure In Figure Ha and (b) 
training and testing loss of all the models can be visualized. From this curve, it can be seen that after a 
while traditional DNNs tend move in such a direction that the test loss increases. However, autoencoder based 
proposed approach doesn’t have this problem and moves toward achieving a zero score in test loss. Although 
it should be kept in mind that the autoencoder is testing on unknown uninfected cell images while the DNNs 
are performing tests on unknown infected and uninfected cell images. 

After training the model properly, the results shown in Table[I]are obtained. In order to better under- 
stand the performance of the proposed approach four different metrics are used which are accuracy, precision, 
recall and F1 score. To be able to visualize the comparison of various models a bar chart is displayed as well in 
Figure [5] The lowest performing model according to the bar chart and the table is LeNet. This model was able 
to acquire 94.64% accuracy, 94.73% precision, 94.56% recall and 94.64% F1 score. MobileNetv2 attained the 
second highest accuracy and F1 score which is 96.28% and 96.27%. However, the best performing method was 
the proposed AnoMalINet. It was able to achieve 98.49% accuracy, 97.07% precision, 100% recall and 98.52% 
F1 score. 


và- ResNet50 5 và- ResNet50 
0.6 —— [Wee 0.6 — LeNet 
A ==- _VGG16 == VGG16 
0.5 : —-- MobileNetv2 0.5 —-- MobileNetv2 


SERERE Proposed 1.. Proposed | 
l 


Epoch Epoch 


(a) (b) 


Figure 4. Loss vs epoch graph for (a) training data and (b) testing data 


Table 1. Comparative study of the performance of the proposed method against other DNN models 
Method Accuracy (%) Precision (%) Recall (%) F1 score (%) 


LeNet 94.64 94.73 94.56 94.64 
VGG16 96.11 95.40 96.91 96.14 
Resnet50 95.64 95.28 96.04 95.66 
MobileNetV2 96.28 96.67 95.84 96.27 
AnoMalNet 98.49 97.07 100 98.52 


m Accuracy(%) Precision(%) mRecall(%) ™F1Score(%) 


Figure 5. A bar chart illustrating different models performance 
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Apart from comparing with traditional DNN models, another study is also conducted with other pro- 
posed methods which are proposed by other researchers. Table |2| shows a comparison of this manuscripts 
proposal with other research works. From this table, it can be seen that the proposed autoencoder based outlier 
detection method outperforms other traditional DNN based classification techniques. 


Table 2. Comparative study of the performance of the proposed method against other published approaches 


Method Accuracy (%) 
Narayanan et al. 96.60 
Reddy and Juliet 95.40 
Raihan and Nahid 94.78 
AnoMalNet 98.49 


5. CONCLUSION 


An autoencoder-based DNN architecture is presented in this research work for classifying malaria 
parasite infections in cell images. This DNN model is trained to identify outliers i.e. parasite-infected cells. As 
this model is trained on completely normal cell images, this method provides an advantage in scenarios where 
disease-positive samples are scarce. With the help of MSE loss and a threshold, this approach can correctly 
identify images with malaria parasites. Additional comparisons with other traditional DNN models have been 
shown in this experiment from which it can be seen that the proposed approach performs better than traditional 
DNN models. There are some scopes of improvement in this research work. Like incorporating more complex 
datasets for this model, expanding the task from binary classification to multi-class classification. 
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